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Abstract 


We present a new online algorithm for detecting overlapping commu¬ 
nities. The main ingredients are a modification of an online k-means 
algorithm and a new approach to modelling overlap in communities. An 
evaluation on large benchmark graphs shows that the quality of discov¬ 
ered communities compares favourably to several methods in the recent 
literature, while the running time is signihcantly improved. 

1 Introduction 

A community in a graph is a set of nodes such that the density of connections 
between the nodes within the set is higher then then density of connections 
between the set and its complement. Communities have been observed in a 
wide variety of real world graphs, such as scientific paper citation networks, 
friendship networks in social media, link graphs of the internet, transportation 
networks and protein-protein interaction networks, to name a few. Generally, 
members of the same community share similar application specific properties 
and communities can be regarded as higher level building blocks of the graphs. 

In many situations it is natural to assume that a node in a graph can belong 
to several communities. For instance, a member on a social network can belong 
to a community ’Family’, to community ’School’ and to community ’Karate 
club’. A node in a transportation network can belong to several communities if 
it is a hub on a boundary of two or more regions. 

Community detection is an active research field, and it has created a large 
and growing literature. We refer to [6] for an extensive survey and a sample 
of applications, and to [18) . which surveys specifically overlapping community 
detection methods. 

As with many data mining problems, one can say that there are two main 
challenges in community detection. The first is to detect communities as pre¬ 
cisely as possible. One common approach to measuring this is to run the algo¬ 
rithms on a set of LFR benchmarks, The LFR benchmarks are models 

of random graphs with a community structure and have certain characteristics 
resembling real world graphs, such as power law degree distributions. More de¬ 
tails are given in Section [3.2.1l The quality of the communities produced by the 
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algorithm is then asserted by comparing them to the known ground truth com¬ 
munities of the benchmarks, using the extended mutual normalized information 
(ENMI) measure, defined in [T5] . 

The other challenge is to design algorithms that can run on really large 
graphs in a reasonable time. In recent years several methods for detection of 
overlapping communities were developed that scale to graphs with millions of 
nodes. In particular, as reported in [9], the algorithms SVI, due to [9], the 
Poisson modelling algorithm due to [5], the COPRA algorithm due to m, and 
the INFOMAP algorithm due to m can produce non-trivial results on LFR 
graphs with N = 1,000,000 nodes and about 750 overlapping communities, 
each of size 2000 to 5000. It was found that the SVI and Poisson algorithms 
produced an FNMI score of .8 on these graphs, while COPRA and INFOMAP 
produced FNMI of .5 and .25 respectively. On the other hand, the running 
time allocated to the SVI and Poisson algorithms was 24 honrs, after which the 
algorithms were terminated and the current best estimate on the commnnities 
was returned. The running time of COPRA and INFOMAP were not specified. 

In this paper we present a simple online algorithm,CLAGO (Cluster Aggre¬ 
gation for Overlapping Communities), for detecting overlapping communities. 
We evaluate our algorithm on a set of large benchmarks with the same param¬ 
eters as were considered in [5] and find that the performance in terms of FNMI 
is similar to the performance of SVI and Poisson algorithms, while the running 
time of our algorithm is significantly better. In particular, our algorithms pro¬ 
duces FNMI of .8 on the above mentioned N = 1, 000, 000 benchmark after 2.5 
hours. 

Our algorithm operates in two stages. In the first stage, we produce a non¬ 
overlapping partition of a graph. This part of the algorithm takes as an input 
the number of nodes iV, and the number of communities to find k (however, see 
later remarks about pre-specifying the number of communities). It maintains 
k vectors of length N as parameters, and it is assumed that the algorithm is 
presented with nodes of the graph, one at a time, in a random order. Fach node 
is presented to the algorithm together with the set of its neighbours, and for each 
node an update to the parameters is made. After all nodes were presented, we 
either terminate or proceed to another iteration. The description of the update 
is given in Section 12.11 Curiously enough, we find that the above mentioned 
N = 1,000,000 benchmarks contain enough redundancy so that even a single 
iteration (jSO min.) is sufficient to obtain an FNMI as high as .75. In the second 
stage of the algorithm we derive overlapping communities from the disjoint 
communities. The general philosophy is that the probability that a node belongs 
to a given community is proportional to the probability of hitting the node by 
a random walk started at the community. Details are given in Section 12.21 

The rest of the paper is organized as follows: In Section[5]we define the algo¬ 
rithm and derive some of its basic properties. Section [3] contains the empirical 
evaluation of the algorithm. 
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2 Algorithm 

Suppose we are given a graph G with N vertices, and we want to find k overlap¬ 
ping communities in it. The algorithm proceeds in two stages. First we partition 
the graph into k disjoint communities. Then we use a step of a random walk 
from the disjoint parts to deduce the overlapping communities. 

2.1 Disjoint Communities 

Denote the graph by G = [V, E). For a node x GV, denote by dx the degree of 
X, and by Ux the set of neighbours of x. Set Wx to be a distribution of one step 
of a random walk from x - a uniform measure on Ux- For a subset 5" C let 
ds = J2xes total degree of the set S. 

Let pi,...,pk be randomly initialized probability measures on V. Specif¬ 
ically, we use the uniform disjoint initialization - partition V into k random 
subsets of equal size. Si,..., Sk C V, and set Pi{x) = ls^(x)/|S'i|. 


Algorithm 1 Disjoint Communities, CLAG 


Initialize pi,... ,pk 

Initialize counters mi = m 2 = ... = ruk = 0 

repeat 

Set xi, ..., xn to be a random permutation of the nodes, 
for i G 1,..., N do 

t G- argmaxi<^<^,(pj, 
mt mt + dxi 

{1 dxj \ I dxj 

7^)Pt + 

end for 

until Stopping condition is met 
For t < k, set 

Ct = {xGV I t = aYgmaxj^^j^^{pj,Wxi)} 

Return the partition Ci,... ,Ck, and the parameters {pj}, {rrij}. 


The stopping criterion on line [10] can be chosen in any common way. We 
have found that simply limiting then number of iterations to somewhere from 5 
to 15 is usually sufficient. 

The inner product on line [51 is crucial to the algorithm. Indeed, if the inner 
product 

(Pj^Wx,) (1) 

is replaced by the squared Euclidean norm, 

- \Pj - WxiP = -\wxi? + ‘2{Pj,Wxi) - biP (2) 

then Algorithm |T] is the online k-means algorithm in a form given in [3] (with 
a caveat that every Wx comes with multiplicity dx)- We find empirically that 
the usage of o instead of ([2]) results in significantly better quality of found 
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partitions. Let us mention one possible reason why (ED performs better than 
©■ The difference between the expressions is the term \pj\'^, which is small for 
spread-out, large support measures. This can, for instance, be observed on the 
Political Blogs example [T] (see Section lOT) . where Algorithm [1] with cost ([H 
finds one huge component instead of two smaller ones. However, the cost ED 
has bias towards small support measures. As mentioned in Section 13.2.21 on 
large graphs with high k, Algorithm [U CL AG, tends to produce, in addition to 
the true partition of the graph, some small components. These components can 
easily be detected by their size and pruned. 

In the rest of this section we describe several basic properties of CLAG 
Algorithm. First we discuss the shape of the parameters pi when the algorithm 
is close to convergence. 

Denote by pj and the value of parameters pj and at the start of 
iteration s of the “repeat” loop of the algorithm. For each x G V, set C^{x) 
to be the parameter index to which x was assigned during iteration s. In other 
words, C’^{x) is the value of t that was assigned in line [5] when Xi was x, on the 
s’th iteration of the “repeat” loop. We say that the algorithm is in stationary 
state at iteration s if for all x S H and all s' > s, 

(^"'(x) = ^"(x). (3) 


Clearly if the parameters converge to some limiting values Pj , then the algo¬ 
rithm will enter the stationary state for large enough s (assuming some consis¬ 
tent rule for breaking ties at line [6D . 

Denote by tt the stationary measure of a random walk on G, 

TT{x) = dx/dv. (4) 


For a subset C CV, denote by nc the restriction of tt to C. Namely, Trc{x) = 
dx/dc if X S C and 7rc(x) = 0 otherwise. Set 


PC = ^'^dx ■ Wx 


(5) 


Then pc is the distribution of a random walk that was started from ttc and 
performed one step. 

Lemma 2.1. Assume that Algorithm]^ is at stationary 
s. For j < k, let 

PJ = {x€V I C.(x)=j} 
be the set of nodes assigned to pj. Then for all j < k, 

(7) 

and 

JJl^. 

—-—jr dp^ /dv ( 8 ) 

as s' ^ oo. 


state at some iteration 
( 6 ) 


4 





Proof. By the update rules at lines 0 and |S1 the following more general relation 
holds for all s > 1: 






-p‘i + 




TO" 


-Mp?- 


(9) 


The claim then follows from the stationarity assumption. 


□ 


There are two main consequences of Lemma l2.II First, the particular shape 
of the parameter pj will be useful for the deduction of overlapping commu¬ 
nities in the next section. More importantly, however, Lemma |2 .1 1 provides us 
with a far-reaching interpretation of the quantity {wx,Pj). Note that according 
to ®, the initial values of the parameters pj, produced by random initializa¬ 
tion, are erased quite quickly. Indeed, after the first iteration, the weight of the 
initial value pj in p| is . Since the number of components k is usually small 

^3 

compared to the total degree of the graph, and the original pjs are disjointly 
supported, there will be many sets Pj with high dp.. Therefore pj will typi¬ 
cally look like a convex combination of a few measures of the form pc, for some 
subsets C dV. With this in mind, for a node x £ V and a subset C C V, let 
us interpret the quantity {wx,pc)- For any y £V, note that 


, , |n„ n Cl # (edges from y to C) 

dc[y) = —^^- 

dc dc 


( 10 ) 


Thus, 


{wx,pc) 


1 \ \rt.y n c| 

dx ^ dc 

yerix 


( 11 ) 


and the above sum is the number of paths of length two from x to C, normalized 
by the total degree of C. Thus, assuming pj is close to the form pc for some C, 
the cost o prefers measures Pj with a large number of second order neighbours 
of X and a small total degree. 

From the above discussion it follows that CLAG works by maintaining its 
current estimates of the communities and aggregating nodes towards the com¬ 
munities that have the most similarity with the nodes. Of course, this is the 
operating scheme of many community detection algorithms. Perhaps the most 
close in spirit to our algorithm is the label propagation algorithm for non¬ 
overlapping communities due to [16) . In this algorithm, one simply assigns a 
node a; to a community which contains the maximal number of x's neighbours, 
among all the existing communities. The above mentioned COPRA algorithm, 
m, is a particular extension of the label propagation algorithm to overlapping 
communities. Some even earlier examples of algorithms that use “per node” 
iteration schemes and neighbourhood based decisions are the works m and 
[3]. The distinctive feature of CLAG algorithm is that, as mentioned earlier, 
the cost {wx,pc) implicitly counts neighbours of x at distance two rather then 
direct neighbours, and thus provides a less noisy estimate of whether node x 
should belong to community C. 
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2.2 Overlapping Communities 

Suppose that for a graph G = (V, E) we obtained a partition Ci, ... ,Ck into 
disjoint communities from CLAG algorithm. It is natural to assume that a node 
should be a member of a given community if it has many links to other members 
of this community. Specifically, for a node a; S C, we can define its membership 
in a given community using the probability to reach x by a step of a random 
walk started at that community. In this way, a node can be considered a member 
of several communities if the probability to reach it from these communities is 
relatively high. We now define this formally. 

Recall that we denote by tt the stationary measure of the random walk on 
G, 0. For any partition Ci,... ,Ck of F the following decomposition holds: 


k 

i=i 


( 12 ) 


Indeed, the right hand-side describes the distribution of a process of choosing 
one of the components Cj at random (with probabilities 7r(Gj)), and making 
a step of a random walk from that component. The equality m then states 
the invariance of tt under the random walk. Conversely, suppose the random 
walk hit a node x. Denote by 7x(j) the probability that the component Cj was 
chosen given the node x was hit. Then 


^ 7r(Gj-)/rc-(x) 


(13) 


We regard the "/xijYs as a probabilistic membership model. A node x will be 
considered a member of community Cj with probability 7x(j)- 

The benchmarks for overlapping communities are usually binary, such that a 
node is either a member of certain community or not. We can derive such binary 
assignments by simple thresholding of 7x(j)- Specifically, fix some threshold 
value a € [0,1]. The value a = 0.5 works well in most cases. For a node x G V, 
set s = argmaXj^f. -jxij) to be the index of a community with maximal hit 
probability at x. Then assign x to the communities Fj,, where 


^x = {j <k I 7x(j) > a7x(s)} . 


(14) 


Up to this point we described how to obtain the overlapping communities F^ 
from the measures and the weights 'x{Cj). We note that there are two ways 
of obtaining these parameters from the output of CLAG. . One possibility is 
to directly construct the measures from the sets Cj, by iterating one time over 
the graph and summing the measures Wx over the x G Cj. Alternatively, note 
that by Lemma 12.11 assuming the algorithm terminated in a near-convergence 
state, the parameters pj are already in the form close to pc and the values 
TT, = ^ — approximate 7r(G,). We therefore could use these parameters 

2^i<k J' 

directly in equation (US, thus avoiding an additional iteration over the graph. 
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Although we believe the second approach should work well, in this paper we 
experimented only with the first approach. 

For future reference, we formalize the overlapping communities algorithm as 
Algorithmic! to which we refer as CL AGO. 


Algorithm 2 Overlapping Communities, CLAGO 
1: Input: Graph G, number of components A:, 

2: threshold parameter a. 

3: Apply CLAG to obtain a partition Ci,... ,Ck- 
4: Gompute the parameters TT{Cj),iiCj and jx using 
5: equations ®,®, and (HI. _ 

6: Return the communities Ci,... ,Ck, where 
7: Cj={xGV I j € Tx} 


2.3 Additional Remarks 

Most of the computation of GLAG is done in the loop of lines 5 to 9. We believe 
that parallelizing this loop to multiple processors should be possible, however 
the parallelization is not trivial. While performing the random walk can be done 
in parallel, the update to each of the pt has to be done using mini-batching and 
some care has to be taken in syncing the mini-batches updates. We leave the 
challenge of parallelizing the algorithm for future work. 

3 Evaluation 

In this section we present the experimental evaluation of GLAG and CLAGO 
algorithms. In section lOI we illustrate the CLAG Algorithm on two well known 
small benchmark graphs. Section 13.2.11 contains the evaluation of the GLAG 
algorithm on non-overlapping benchmark with parameters that were used in 
the benchmark paper [7] . In section 13.2.21 we provide the comparison of the 
CLAGO algorithm with the results that were given in . 

3.1 Some Standard Examples 

Figure [U shows the classical Zachary’s Karate Glub graph, [19]. This graph has 
32 nodes and a ground partition into two subsets. The partition shown in Figure 
[Tjis a partition obtained from a typical run of GLAG with k = 2. It coincides 
with the ground partition except for one node, node 8, which is often miss- 
classified by community detection algorithms (see |6]). We note that because 
the graph is small, CLAG is somewhat sensitive to the random initialization. 
Some invocations of the algorithm would produce the wrong partition. A simple 
common way to obtain consistent results is to restart the algorithm 3 times and 
to choose the partition for which some cost, such as for instance modularity, |5] , 
is maximal. 
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Figure 1: Karate Club Graph 


Figure [2] depicts the political blogs graph, [T]. The nodes are political blogs, 
and the graph has an (undirected) edge if one of the blogs had a link to the other. 
There are 1222 nodes in the graph. The ground truth partition of this graph 
has two components - the right wing and left wing blogs. The labelling of the 
ground truth was partially automatic and partially manual, and both processes 
could introduce some errors. CLAG consistently reconstructs the ground truth 
partition with only 57 to 60 nodes misclassified. These results are similar to 
results obtained by other methods for this graph, [T^ . 

3.2 LFR Benchmarks 

The LFR benchmark, [TS], [13] is a model of a random graph with communities, 
such that the node degrees and community sizes have power law distributions, 
as often observed in real graphs. An important parameter of this model is the 
mixing parameter /i G [0,1], which controls the fraction of the edges of a node 
that go outside the node’s community (or outside all of node’s communities, in 
the overlapping case). For small /x, there will be a small number of edges going 
outside the communities, leading to disjoint, easily separable graphs, and the 
boundaries between communities will become less pronounced as fj, grows. The 
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Figure 2: Political Blogs Graph 


model is generated roughly as follows: First one samples the degrees from a spec¬ 
ified power law and assigns them to nodes. Then one samples community sizes, 
from another power law, until the sizes sum up to the total size of the graph (or 
more accordingly, in the overlapping case). One can also clamp the power law, 
so that community sizes and degrees will fall in predefined boundaries. Then 
one connects the nodes at random, in a way that preserves degrees and the mix¬ 
ing coefficient. The paper m introduces the benchmarks for non-overlapping 
communities and in m the overlapping communities case is treated. 

The quality of communities found by an algorithm will be measured by a 
version of the normalized mutual information with respect to the ground truth 
communities. Given two partitions, P,Q on a, set V, the normalized mutual 
information between the partitions is 


NMI{P, Q) 


g IiP,Q) 

H{P) + H{Q)^ 


(15) 


where H is the Shannon entropy of the partition and / is the mutual information 
between the partitions (see i, i). An important property of the NMI is that 
it is equal to 1 if and only if the partitions P and Q coincide, and it takes values 
between 0 and 1 otherwise. 

In NMI, the sets inside P, Q can not overlap. An extension of NMI to the 
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overlapping case was proposed in and also has the property of being equal 
to 1 if and only if the communities coincide. This extension is used to evaluate 
the results in and is also used in [7], where a number of non-overlapping 
community detection algorithms are compared. Note that in both these papers 
the extended NMI is denoted by NMI. Here we will refer to the extended NMI 
by ENMI. The values of ENMI are usually lower then the values of the NMI. 

3.2.1 Non-overlapping case 

Figure [3] shows the results of running CL AG on graphs generated from the 
non-overlapping LFR model, with N = 1000 and N = 5000 nodes, where the 
community sizes where allowed to range between 10 to 50 (denoted by S in the 
graph), and between 20 to 100 (denoted by B), and where fj, ranges between 0 
and 0.8 in steps of .1. For all the graphs, the average degree is 20, the maximum 
degree 50, the exponent of the degree distribution is —2 and that of the com¬ 
munity size distribution is —1. These parameters correspond to experiments in 

m- 

The X axis of Figure [3] is the mixing parameter /i and the y axis is the 
ENMI. Each point on the graph is an average of the ENMI on 20 instances of 
the random graphs with a given parameter set. We have not used restarts for 
these experiments, and we have set the number of communities k to be the true 
number of communities for each instance. 

The results in figure |3] indicate that CL AG performs better than most of the 
algorithms tested in [7]. These results are also similar to those of the Poisson 
model algorithm on this benchmark in [2]. 

3.2.2 Overlapping case 

For this section we use the overlapping LFR model, as defined in m, with 
the same parameters as were used for evaluation in [^. In addition to the 
parameters that are present on the non-overlapping model, in the overlapping 
LFR model one specifies the parameters h and m. h is the number of nodes that 
participate in multiple communities, and each such node will be a member of m 
communities. The rest of the nodes will participate in exactly one community 
each. For all the experiments in this section, h = N/2, where N is the total 
number of nodes, and m = 4. The average degree is 60 for all the graphs. The 
maximum degree, maxk , minimal and maximal community sizes, mine and 
maxc are functions of N. Their values are specified in Table [T] The exponents 
of the degree and the community size distributions are the defaults, —2 and —1. 
For all the experiments in this section we have run the CLAG algorithm with 
15 iterations of the main loop, except for the N = 1000000 case, where we have 
used 5 iterations. 

There are two experiments that are performed in this section. In the first ex¬ 
periment we create graphs of size N = 10000, with the parameters as described 
above and the mixing parameter y, varies between 0 and .7 is steps of .1. For 
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Figure 3: LFR benchmarks,ENMI 


each value of the parameter /i we create 10 instances of the model and apply 
CLAGO, 

Note that in a non-overlapping graphs, the case /i = 0 would be trivial since 
the communities would correspond simply to the connected components of the 
graph. However, when communities can share nodes this is no longer the case. 

An interesting property of CLAG that was revealed by the experiments is 
that if one starts with a number of components significantly higher then the 
true number, the algorithm will retain only the necessary number. Namely, if 
starting k is high, then the algorithm will return with many empty sets Cj , and 
the number of the non-empty ones will be close to the true k. Therefore, we 
only need to choose high enough k to start with. The number of communities 
of < 100000 graphs in this section is strongly concentrated around 75 (it 
is between 72 and 78 for all instances), and for N = 1000000 this number is 
750. Consequently for all graphs with N < 100000 we set k = 150 and for 
N = 1000000 set k = 1500. We note however, that this convergence of the 
number of components seems to depend on the topological complexity of the 
graph. It does not happen on the benchmarks of the previous section for higher 
values of fi. 
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Another, possibly related, feature of the algorithm that was observed on the 
benchmarks is that some of the sets Cj that are returned have unreasonably 
small sizes. For instance, for the N = 100000 benchmarks, where minimal 
community size is 200, some of the retuned sets where of sizes less then 10. Asa 
general rule, unless one expects to have small-size communities, one can prune 
these sets from the final results. 

We now return to the description of the experiments. Figured contains the 
results of the first experiment, as described above, and shows the value of ENMI 
against the mixing coefficient /i. Each point is an average of the evaluations 
over 10 random instances. The standard deviation of the results at each ^ was 
nearly zero. We show the results with and without pruning. With pruning, 
communities of sizes less then 20 were removed from the final results. The 
performance without pruning is close but higher then all the algorithms that 
were considered in [^, and the performance with pruning improves further. The 
running times will be discussed separately in the end of this section. 

In the second experiment in this section we evaluate the performance of 
CLAGO on graphs with sizes N = 1000, 10000,10000 and N = 1000000 with 
mixing parameter /r = 0. The results are given in Table [2] Each row represents 
an average of 10 instances. First two columns are the average ENMI and the 
standard deviation of that average, the third row is the average number of 
ground truth communities in the graph and the last row is the average number 
of communities returned by the algorithm. The standard deviations of these 
averages where less then 5, and 10 for the N = 1000000 case. For N = 100000 
and N = 1000000 we also perform pruning at community size of 200, and Table 
[3] shows the results of the same runs, after pruning the pruning. 

As mentioned earlier, for N = 10000 our results are close but slightly better 
than all the results in [^. For N = 1000000 our results are practically the same 
as those of the SVI and Poisson model algorithms, and pruning can further 
improve the results. For N = 100000 our results are worse, due to redundancy 
of many sets in the returned partition, and pruning increases the ENMI signif¬ 
icantly. An alternative to pruning in this case could be using lower k from the 
start. In real world this would mean obtaining a better estimate on the number 
of communities before running the algorithm. One could, for instance, use the 
same procedure that is used in 0 for the SVI and Poisson algorithms. 

Finally, the experiments were performed on a standard PC with i7 — 4770 
CPU at 3.40GHz under Ubuntu. The running time for N = 1000 cases was less 
then a second. The running time on the N = 10000 instances was 5.5 seconds per 
instance, for 15 iterations of the algorithm. The running time for N = 100000 
was 3 minutes for 15 iterations. The running time for N = 1000000 was 155 
minutes for 5 iterations of the algorithm, about 30 minutes per iteration. The 
time that was allocated for the N = 1000000 case for SVI and Poisson algorithms 
in [5] was 24 hours. We conclude that CLAGO algorithm achieves a similar or 
even slightly better performance in a significantly shorter time. 

Note that in the N = 1000000 case, both the size of the graph and the 
number of communities grew by a factor of 10 compared to N = 100000 case. 
The running time of the algorithm depends on the product of these quantities. 
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which explains the jump between these two cases. 


Table 1: Parameter Settings For LFR Graphs 


N 

maxk 

mine 

maxc 

1000 

100 

20 

50 

10000 

100 

200 

500 

100000 

316 

2000 

5000 

1000000 

1000 

2000 

5000 



Figure 4: Overlapping LFR, N=10, 000,ENMI 
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